teacher algorithm
Evolution Transformer: In-Context Evolutionary Optimization
Lange, Robert Tjarko, Tian, Yingtao, Tang, Yujin
Evolutionary optimization algorithms are often derived from loose biological analogies and struggle to leverage information obtained during the sequential course of optimization. An alternative promising approach is to leverage data and directly discover powerful optimization principles via meta-optimization. In this work, we follow such a paradigm and introduce Evolution Transformer, a causal Transformer architecture, which can flexibly characterize a family of Evolution Strategies. Given a trajectory of evaluations and search distribution statistics, Evolution Transformer outputs a performance-improving update to the search distribution. The architecture imposes a set of suitable inductive biases, i.e. the invariance of the distribution update to the order of population members within a generation and equivariance to the order of the search dimensions. We train the model weights using Evolutionary Algorithm Distillation, a technique for supervised optimization of sequence models using teacher algorithm trajectories. The resulting model exhibits strong in-context optimization performance and shows strong generalization capabilities to otherwise challenging neuroevolution tasks. We analyze the resulting properties of the Evolution Transformer and propose a technique to fully self-referentially train the Evolution Transformer, starting from a random initialization and bootstrapping its own learning progress. We provide an open source implementation under https://github.com/RobertTLange/evosax.
Training Reinforcement Learning Agents and Humans With Difficulty-Conditioned Generators
Tio, Sidney, Ho, Jimmy, Varakantham, Pradeep
We introduce Parameterized Environment Response Model (PERM), a method for training both Reinforcement Learning (RL) Agents and human learners in parameterized environments by directly modeling difficulty and ability. Inspired by Item Response Theory (IRT), PERM aligns environment difficulty with individual ability, creating a Zone of Proximal Development-based curriculum. Remarkably, PERM operates without real-time RL updates and allows for offline training, ensuring its adaptability across diverse students. We present a two-stage training process that capitalizes on PERM's adaptability, and demonstrate its effectiveness in training RL agents and humans in an empirical study. Figure 1: Overview of the proposed 2-stage process. In Stage 1, the IRT-based Parameterized Environment Response Model (PERM) observes a Reinforcement Learning (RL) Agent as it trains in a given environment with randomized levels. During this stage, PERM learns to accurately infer both student ability and level difficulty. In Stage 2, once trained, PERM is deployed to train both artificial and human students. It achieves this by inferring their current ability and providing suitable training levels within the same domain.
Issue #73 H Weekly
This week โ a massive investment in the world's first neural prosthetic for human intelligence enhancement, Chinese version of DARPA's Grand Challenge, MIT is ready for Halloween with a horror AI, the self-driving truck delivered a precious cargo of 50 000 beers and more! Bryan Johnson announced he's investing $100M dollars into Kernel, a company that built the world's first neural prosthetic for human intelligence enhancement. Bryan said he did it "in an effort to enhance human intelligence and reimagine our future. Unlocking our brain is the most significant and consequential opportunity in history -- and it's time sensitive." Wired takes a closer look at three teams that were competing at the first Cyborg Olympics โ one from exoskeleton race, one from arm prosthetic competition and one that was stimulating paralyzed muscles in a cycling race.
AI can learn from data without ever having access to it
In recent months, security researchers have shown that machine learning algorithms can be reverse-engineered and made to expose user data, like personal photos or health data. So how can we protect that information? New research from OpenAI and Google shows a way to build AI that never sees personal data, but is able to function as if it had. Ian Goodfellow, a researcher at OpenAI, compares the system to medical school. "The doctors who teach in medical school have learned everything they know from decades of experience working with specific individual people, and as a side effect they know a lot of private medical histories," Goodfellow says.